Goto

Collaborating Authors

 biological vision


Adaptive Contextual Perception: How To Generalize To New Backgrounds and Ambiguous Objects

Neural Information Processing Systems

Biological vision systems make adaptive use of context to recognize objects in new settings with novel contexts as well as occluded or blurry objects in familiar settings. In this paper, we investigate how vision models adaptively use context for out-of-distribution (OOD) generalization and leverage our analysis results to improve model OOD generalization. First, we formulate two distinct OOD settings where the contexts are either beneficial Object-Disambiguation or irrelevant Background-Invariance, reflecting the diverse contextual challenges faced in biological vision. We then analyze model performance in these two different OOD settings and demonstrate that models that excel in one setting tend to struggle in the other. Notably, prior works on learning causal features improve on one setting but hurt on the other.


Recurrent neural network dynamical systems for biological vision

Neural Information Processing Systems

In neuroscience, recurrent neural networks (RNNs) are modeled as continuous-time dynamical systems to more accurately reflect the dynamics inherent in biological circuits. However, convolutional neural networks (CNNs) remain the preferred architecture in vision neuroscience due to their ability to efficiently process visual information, which comes at the cost of the biological realism provided by RNNs. To address this, we introduce a hybrid architecture that integrates the continuous-time recurrent dynamics of RNNs with the spatial processing capabilities of CNNs. Our models preserve the dynamical characteristics typical of RNNs while having comparable performance with their conventional CNN counterparts on benchmarks like ImageNet. Compared to conventional CNNs, our models demonstrate increased robustness to noise due to noise-suppressing mechanisms inherent in recurrent dynamical systems.


Better artificial intelligence does not mean better models of biology

Linsley, Drew, Feng, Pinyuan, Serre, Thomas

arXiv.org Artificial Intelligence

Vision science has always developed models at smaller scales than the frontier of artificial intelligence. This is partially because of the academic roots of vision science, partially because of a well-founded desire to lean on reductionism to truly understand how vision works, and partially because attempts at incorporating biological inspiration into DNNs have been hamstrung by implementations that are poorly suited for GPUs. For example, most attempts at biologically-inspired DNNs have focused on inducing architectural constraints like recurrence [34,61,116-118] and different forms of feedback [59,60] that are not explicitly included in DNNs but known to play key roles in primate vision [119-123] . While we believe these approaches are important for Neuroscience and especially for constraining model hypothesis spaces in small data settings, the methods used for implementation are undeniably challenging to scale [118,124] and it is possible that the induced computational strategies could be learned by a less-constrained DNN trained with the "right" data and objective [125] . Thus, it may be that a more effective approach to reverse-engineering vision than hand-designing small-scale recurrent DNNs could be to train DNNs at large scales with approximations of the kinds of data and routines that shape biological visual systems.


Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing Systems

This paper uses a convolutional network to generate textures with the aim of using this texture generation technique to generate stimuli that elicit predictable neural responses in different parts of the visual pathway, assuming it works like the model. They match the correlations of CNN activities of an image to a synthesized image up to some layer in a CNN. Hopefully this will lead to some fruitful experiments. From past work, it is obvious that generating an image constrained by some summary statistics of a deep net representation of a target image will have a greater resemblance to this target if more of the deep net representation summary statistics are used. This paper does a nice job of showing that convincing textures can be generated using this technique, and that by using statistics from more layers, the process improves.


Adaptive Contextual Perception: How To Generalize To New Backgrounds and Ambiguous Objects

Neural Information Processing Systems

Biological vision systems make adaptive use of context to recognize objects in new settings with novel contexts as well as occluded or blurry objects in familiar settings. In this paper, we investigate how vision models adaptively use context for out-of-distribution (OOD) generalization and leverage our analysis results to improve model OOD generalization. First, we formulate two distinct OOD settings where the contexts are either beneficial Object-Disambiguation or irrelevant Background-Invariance, reflecting the diverse contextual challenges faced in biological vision. We then analyze model performance in these two different OOD settings and demonstrate that models that excel in one setting tend to struggle in the other. Notably, prior works on learning causal features improve on one setting but hurt on the other.


Fixing the problems of deep neural networks will require better training data and learning algorithms

Linsley, Drew, Serre, Thomas

arXiv.org Artificial Intelligence

Over the past decade, vision scientists have turned to deep neural networks (DNNs) to model biological vision. The popularity of DNNs comes from their ability to rival human performance on visual tasks [1] and the seemingly concomitant correspondence of their hidden units with biological vision [2]. Bowers and colleagues [3] marshal evidence from psychology and neuroscience to argue that while DNNs and biological systems may achieve similar accuracy on visual benchmarks, they often do so by relying on qualitatively different visual features and strategies [4-6]. Based on these findings, Bowers and colleagues call for a re-evaluation of what DNNs can tell us about biological vision and suggest dramatic adjustments going forward, potentially even moving on from DNNs altogether. Are DNNs poorly suited to model biological vision?


Event cameras and representation learning improve visuomotor policies Inspired by biological vision

#artificialintelligence

Editor's note: This research was conducted by Sai Vemprala, Senior Researcher, and Ashish Kapoor, Partner Researcher, of Microsoft Research along with Sami Mian, who was a PhD Researcher at the University of Pittsburgh and an intern at Microsoft at the time of the work. Autonomous systems are composed of complex perception-action loops, where observations of the world need to be processed in real time to result in safe and effective actions. A significant amount of research has focused on creating perception and navigation algorithms for such systems, often using visual data from cameras to reason about which action to take depending on the platform and task at hand. While there have been a lot of improvements in how this reasoning is performed, and how information can be extracted efficiently from camera imagery, there are a number of challenges when it comes to achieving autonomous systems that receive and process information both accurately and quickly enough for applications in real-world scenarios. These challenges include the speed limitations posed by commercial off-the-shelf cameras, data that is unseen during training of vision models, and the limitations of sensors in RGB camera sensors.


What deep learning can tell us about higher cognitive functions like mindreading?

Aru, Jaan, Vicente, Raul

arXiv.org Artificial Intelligence

We will first briefly consider how DL has contributed to the research on visual object recognition. In the main part we will assess whether DL could also help us to clarify the computations underlying higher cognitive functions such as Theory of Mind. In addition, we will compare the objectives and learning signals of brains and machines, leading us to conclude that simply scaling up the current DL algorithms will most likely not lead to human level Theory of Mind.


Discriminant Saliency for Visual Recognition from Cluttered Scenes

Gao, Dashan, Vasconcelos, Nuno

Neural Information Processing Systems

Saliency mechanisms play an important role when visual recognition must be performed in cluttered scenes. We propose a computational definition of saliency that deviates from existing models by equating saliency to discrimination. In particular, the salient attributes of a given visual class are defined as the features that enable best discrimination between that class and all other classes of recognition interest. It is shown that this definition leads to saliency algorithms of low complexity, that are scalable to large recognition problems, and is compatible with existing models of early biological vision. Experimental results demonstrating success in the context of challenging recognition problems are also presented.


Discriminant Saliency for Visual Recognition from Cluttered Scenes

Gao, Dashan, Vasconcelos, Nuno

Neural Information Processing Systems

Saliency mechanisms play an important role when visual recognition must be performed in cluttered scenes. We propose a computational definition of saliency that deviates from existing models by equating saliency to discrimination. In particular, the salient attributes of a given visual class are defined as the features that enable best discrimination between that class and all other classes of recognition interest. It is shown that this definition leads to saliency algorithms of low complexity, that are scalable to large recognition problems, and is compatible with existing models of early biological vision. Experimental results demonstrating success in the context of challenging recognition problems are also presented.